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THE MAILING DATE OF THIS COMMUNICATION. . . . 
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Application/Control Number: 09/852,620 P^g® ^ 

Art Unit: 2676 

DETAILED ACTION 



1 . This action is responsive to communications of application filed 5/1 1/2001 . 

2. The disposition of the claims is as follows: claims 1-39 are pending in the application 
Claims 1 , 6, 1 1 , 1 6, 20, 24 and 28-39 are independent claims. 



Information Disclosure Statement 

3. The information disclosure statement filed 5/1 1/2001 fails to comply with 37 CFR 

1 .98(a)(2), which requires a legible copy of each U.S. and foreign patent; each pubhcation or that 
portion which caused it to be listed; and all other information or that portion which caused it to 
be listed. Patent serial number 09/852,620 is listed but not present. 

4. Both IDS submissions are missing form PTO- 1 449, therefore no signed PTO- 1 449 is 



enclosed. 



Specification 



5. The disclosure is objected to because of the following informalities: On p. 19, In. 4 
references "vertexes", but in In. 9 reference is made to "vortexes". Ln. 9 should most likely be 



"vertexes". 



Appropriate correction is required. 



Claim Objections 
6. Claim 28 is objected to because of the following informalities: 
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Preamble of claim repeats itself with the following, "An article of manufacture 

computer usable medium having computer readable program code means embodied 



comprismg a 
therein". 



Appropriate correction is required. 



Claim Rejections - 35 USC § 112 

7. The following is a quotation of the second paragraph of 3 5 U.S. C. 112: 

The specification shall conclude with one or more claims particularly pomtmg out and distinctly cla.mmg the 
subject matter which the applicant regards as his invention. 

8. Claims 6 and 9 are rejected under 35 U.S.C. 1 12, second paragraph, as being indefinite 
for failing to particularly point out and distinctly claim the subject matter which applicant 
regards as the invention. 

9. Applicant's use of the term "reference frame" lacks clarity as to whether "reference 
frame" is to be interpreted as using a particular image frame as a starting or base reference or if 
"reference frame" is to be interpreted as using any particular coordinate system as a reference 
frame. 

Claim Rejections - 35 USC § 102 

10. The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the 
basis for the rejections under this section made in this Office action: 

A person shall be entitled to a patent unless - 

(a) the invention was known or used by others in this country, or patented or described in a printed publication in this 
or a foreign country, before the invention thereof by the applicant for a patent. 

11. Claims 1-10, 28, 29, 34, and 35 are rejected under 35 U.S.C. 102(a) as being disclosed by 
Lee et al., (US Patent PubUcation 2001/0048753 Al), hereafter Lee. 
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A. Claim 1, 'A method of describing object region data about an object in video data over a 
plurality of frames, said method comprising: ' is disclosed by Lee ir. para. 16. 41-42, 45, 49, 51, 
and 73 at "[0016] Preferably, the user is presented with a graphical user interface showing a 
frame of video data, and the user identifies, with a mouse, pen, tablet, etc., the rough outline of 
an object by selecting points around the perimeter of the object. Curve-fitting algorithms can be 
applied to fill in any gaps in the user-selected points. After this initial segmentation of the 
object, the unsupervised tracking is performed. During unsupervised tracking, the motion of the 
object is identified from frame to frame. The system automatically locates similar semantic 
video objects in the remaining frames of the video sequence, and the identified object boundary 
is adjusted based on the motion transforms."; 

"[0041] FIG. 1 shows the two basic steps of the present system of semantic video object 
extraction. In the first step 100, the system needs a good semantic boundary for the initial frame, 
which will be used as a starting 2D-template for successive video frames, 'approximating the 
object using a figure for each of said frames; ' During this step a user indicates 1 10 the rough 
boundary of a semantic video object in the first frame with an input device such as a mouse, 
touch sensitive surface, pen, drawing tablet, or the like. Using this initial boundary, the system 
defines one boundary lying inside in the object, called the In boundary 102 and another boundary 
lying outside the object, called Out boundary 104. These two boundaries roughly indicate the 
representative pixels inside and outside the user-identified semantic video object. These two 
boundaries are then snapped 106 into a precise boundary that identifies an extracted semantic 
video object boundary. Preferably the user is given the opportunity to accept or reject 112,114 
the user selected and computer generated outlines. [0042] The goal of the user assistance is to 
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p^vid. an approximation of fte object boundary by just using the input device, without the user 
having to precisely define or othe^ise indicate control points around the image feature. 
Requiring precise identification of control points is time consuming, as well as limiting the 
resulting segmentation by me a^uracy of the initial pixel definitions. A preferred alternative to 
such a prior a« method is to allow the user to identify and portray the initial object boundary 
easily and not precisely, and then have this initial approximation modified into a precise 
boundary." 

••[00451 Both steps 100 and step 1 08 require the snapping of an approximate boundary to 
a precise one. As described below, a motphological segmentation can be used to refine the 
imtial user-defined bomtdary (step 110) and the motion compensated boundary (S.sub.O) to get 
the final precise boundary of the semantic video object." 

••[0049] Tbe second is a contour-based method in which a user only indicates control 
points ■e^urac.iniapl.rall.y ofpoi.,s representing .He figure for eacH ofsaidfi-an.es: ' along .he 
outline of an object bounda^, and splines or polygons a« u«d to approximate a boundary based 
upon the control points, 'approximo.ing .He objeC using a figure for each of said frames: ' The 
addition of Splines is superior over the first method because it allows one to fill in the gaps 
between flie indicated points. The drawback, however, is that a spline or polygon will generally 
produce a best-fit .esuU for the input points given. With few points, broad curves or shapes will 

result. Thus, to get an accurate shape, many points need to be accurately placed abom the image 
feamre's true boundary. But, if it is assumed n nodes guarantees a desired maximal boundary 

approximation error of e pixels, at a minimmn the user must then enter n keystrokes to define a 
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border. For complex shapes, n may be a very large number. In order to avoid such reduce user 
effort, n can be decreased, but this approach yields larger e vales." 

"[0051] As shown, a user has marked, with white points, portions of the left image 148 to 
identify an image feature of interest. Although it is preferable that the user define an entire 
outline around the image feature, doing so is umiecessary. As indicated above, gaps in the 
outline will be filled in with the hybrid pixel-polygon method. The right image 150 shows the 
initial object boundary after gaps in the initial outline of the left image 148 have been filled in. 
By allowing the user to draw the outline, the user is able to define many control points without 
the tedium of specifying each one individually, •extracting a plurality of points representing the 
figure for each of said frames; ' In the prior art. allowing such gaps in the border required a 
tradeoff between precision and convenience. The present invention avoids such a tradeoff by 
defining In and Out boundaries and modifying them to precisely locate the 



actual boundary of the (roughly) indicated image feature. 

"[0073] When there are no more pixels to classify, pixels assigned to a In marker are 
pixels interior to the image feature (semantic video object) defined by the user (FIG. 1. step 1 10). 
and pixels assigned to an Out marker are similarly considered pixels exterior to the semantic 
object. As with pixel-wise classification, the locations where the In and Out pixel regions meet 
identifies the semantic object's boundary. The combination of all hi pixels constitutes the 

segmented semantic video object." 

"[0085] Returning to FIG. 8, after prediction 350. the next step is motion estimation 352. 
It is somewhat axiomatic that a good estimation starts with a good initial setting. By recognizing 



that 



in the real world the H-.j ^rtorv of an object is generally smooth, this informafion can be 
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applied to interpreting recorded data to improve compression efficiency. For simplicity, it is 
assumed that the trajectory of a semantic video object is basically smooth, and that the motion 
mformation in a previous frame provides a good guess basis for motion in a current frame. 
■approximating trajectories with functions, the trajectories being obtained by arranging, in the 
frames advancing direction, position data about one of said plurality of points and relative 
position data about remaining points with reference to said one of said plurality of points; ' 
Therefore, the previous motion parameters can be used as the starting point of the current motion 
estimation process. (Note, however, that these assumptions are for simplicity, and all 
embodiments need not have this limitation.) For the first motion estimation, since there is no 
previous frame from which to extrapolate, the initial transformation is set to a=e=l , and 
b=c=d=f^g=h=0." 

"[0021] A motion transformation fimction 'and describing the object region data using 
the functions. ' representing the transformation between the object in the first frame and the 
object of the second frame, can be applied to the outline to warp it into a new approximate 
boundary for the object in the second frame. In subsequent video frames, inner and outer 
boundaries are defined for the automatically generated new approximate boundary, and then 
snapped to the object. Note that implementations can provide for setting an en-or threshold on 
boundary approximations (e.g. by a pixel-error analysis), allowing opportunity to re-identify the 



object's boundary in subsequent frames." 

B. Claim 2, 'The method according to claim 1, wherein said object region data comprises 
information representing a range of frames in which the object exists in the video data and 
information identifying the figure approximating the object region. ' is disclosed supra for claim 



unsu 
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1 by Lee and in para. 17 at "[0017] Mathematical morphology and global perspective motion 
estimationZ-compensation (or an equivalent object tracking system) is used to accomplish these 

pervised steps. Using a set-theoretical methodology for image analysis (i.e. providing a 
mathematical framework to define image abstraction), mathematical morphology can estimate 
many features of the geometrical structure in the video data, and aid image segmentation. 
Instead of simply segmenting an image into square pixel regions unrelated to frame content (i.e. 
not semantically based), objects are identified according to a semantic basis and their movement 
tracked throughout video frames. This object-based information is encoded into the video data 
stream, and on the receiving end, the object data is used to re-generate the original data, rather 
than just blindly reconstruct it from compressed pixel regions. Global motion estimation is used 
to provide a very complete motion description for scene change from frame to frame, and is 
employed to track object motion during unsupervised processing. However, other motion 
fracking methods, e.g. block-based, mesh-based, parametric estimation motion estimation, and 
the like, may also be used." 

C. Claim 3, 'The method according to claim 1, wherein said object region data comprises 
one of information representing related information linking to the object and information 
representing a method of accessing the related information. ' is disclosed supra for claim 1, 
particularly in para. 49. 

D. Claim 4, 'The method according to claim 1, wherein said relative position data are 
ponents of differential vectors between the one of said plurality of points and remaining 



com 



points. ' is disclosed supra for claim 1 and in para. 81 at "[0081] The algorithm computes the 
partial derivatives of e.sub.j in the semantic video object with respect to the unknown motion 
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parameters(a,b,c,d...f,g).Tha.is.l=imO = xiDirx' = im7 = yiDi(xiTx' + yi'I 
'y')akl = ieimkeimlbk = -ieieimk 

[0082] where D.sub.j is the denominator, r=F.sub.k', I=F.sub.k-l and (m.sub.O, m.sub.l, 
m.sub.2, m.sub.3, m.sub.4, m.sub.5, m.sub.6, m.sub.7)=(a, b, c, d, e, f, g, h)." 

Claim 5. -The method according to claim 1. wherein said object region data comprises 



E. 



parameters of the functions. ' is disclosed supra for claim 1 and in para. 86 at "[0086] Once 
motion prediction 350 and estimation 352 is computed, the previous boundary is then warped 
354 according to the predicted motion parameters (a, b, c, d, e, f, g, h), i.e.. the semantic object 
boundary in the previous frame (B.sub.i-1) is warped towards the current frame to become to 
current estimate boundary (B.sub.i'). Since the warped points generally do not fall on integer 
pixel coordinates, an inverse warping process is performed in order to get the warped semantic 
object boundary for the current frame. Although one skilled in the art will recognize that 

* 

alternate methods may be employed, one method of accomplishing warping is as follows." 
F. Claim 6, 'A method of describing object region data about an object in video data over a 
plurality of frames, said method comprising: approximating the object using a figure for each of 
said frames; extracting a plurality of points representing the figure for each of said frames; 
approximating trajectories with functions, the trajectories being obtained by arranging, in the 
frames advancing direction, position data about said plurality of points in a reference frame and 
relative position data about said plurality of points in a succeeding frame with reference to the 
position data about said plurality of points in the reference frame; and describing the object 
region data using the functions. ' is disclosed supra for claim 1 and in para. 74-77 at "[0074] FIG. 
8 is a flowchart showing automatic subsequent-frame boundary fracking, performed after a 
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semantic video object has been identified in an initial frame, and its approximate boundary 
adjusted (i.e. after pixel classification). Once the adjusted boundary has been determined, it is 
tracked into successive predicted frames. Such fracking continues iteratively until the next initial 
frame ^reference frame' (if one is provided for). Subsequent frame tracking consists of four 
steps: motion prediction 350, motion estimation 352, boundary warping 354, and boundary 
adjustment 356. Motion estimation 352 may track rigid-body as well as non-rigid motion. 
[0075] In a given frame sequence, there are generally two types of motion, rigid-body in-place 
movement and translation^ movement. Rigid motion can also be used to simulate non-rigid 
motion by applying rigid-motion analysis to sub-portions of an object, in addition to applying 
rigid-motion analysis to the overall object. Rigid body motion can be modeled by a perspective 
motion model. That is, assume two boundary images under consideration are B.sub.k-l(x, y) 
which includes a boundary indicating the previous semantic video object, and a current boundary 
indicated by B.sub.k(x', y'). Using the homogeneous coordinates (coordinate 'reference frame ' 
implied), a 2D planar perspective transformation can be described as: 
x'=(a*x+b*y+c)/(g*x+h*y+l) 
[0076] y'=(d*x+e*y+f)/(g*x+h*y+l) 

[0077] The perspective motion model can represent a more general motion than a franslational or 
affine motion model, such that if g=h=0 and a=l, b=0, d=0, e=l, then x'=x+c and y=y+f. which 
becomes the translational motion model. Also, if g=h=0, then x'=a*x+b*y+c and y'=d*x+e*y+f, 
which is the affine motion model." 

G. Per dependent claims 7- 1 0, these are directed to a method for performing the method of 
dependent claims 2-5, respectively, and therefore are rejected to dependent claims 2-5. 
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H. Per independent claims 28, 29 and 34, 35, these are directed to a article of manufacture 
and computer data signal, respectively, for performing the method of independent claims 1 and 
6, respectively, and therefore are identically rejected to independent claims 1 and 6. 



over 



Claim Rejections - 35 USC § 103 

12. The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

fa^ A oatent may not be obtained though the invention is not identically disclosed or described as set forth in 
sJctiorS o^this tUle .f the differences between the subject matter sought to be patented and the pnor art are 
^tL ub ect as a whole would have been obvious at the time the invention was made to a person 
Talg ordinal STnle S to which said subject matter pertams. Patentability shall not be negatwed by the 
manner in which the invention was made. 

13. Claims 1 1-23, 30-32 and 36-38 are rejected under 35 U.S.C. 103(a) as being unpatentable 
Lee et al., (US Patent Publication 2001/0048753 Al), as applied to claims 1-5 above, and 

further in view of Jasinschi et al, (US Patent Number 6,504,569 Bl), hereafter Jasinschi. 
A. Claim 11,'^ method of describing object region data about an object in video data over 
a plurality of frames, said method comprising: approximating the object using a figure for each 
of said frames; extracting a plurality of points representing the figure for each of said frames; 
approximating trajectories with functions, the trajectories being obtained by arranging, in the 
frames advancing direction, data indicating positions of said plurality of points; and describing 
the object region data using the functions and depth information of the object. ' is disclosed by 
Lee supra for claim 1 . However Lee does not appear to disclose 'describing the object region 
data using depth information of the object', but Jasinschi does in col. 1, Ins. 37-58 at "(10) 
Accordingly the present invention provides a method of generating 2-D extended images from 3- 
D data extracted from a video sequence representing a natural scene. In an image pre-processing 
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stage image feature points are determined and subsequently tracked from frame to frame of the 
video sequence. In a structure-from-motion stage the image feature points are used to estimate 
three-dimensional object velocity and deEth. Following these stages d^S^ and motion 
information are post-processed to generate a dense three-dimensional deeth map. World 
surfaces, corresponding to extended surfaces, are composed by integrating the three-dimensional 

depth map information." 

Therefore it would have been obvious to one of ordinary skill in the art at the time the 
invention was made to apply object tracking disclosed by Lee in combination with depth 
determining information disclosed by Jasinschi, and motivated to combine the teachings because 
it would provide a method of generating 2-D extended images from 3-D data extracted from a 
two-dimensional video sequence as revealed by Jasinschi in col. 1, lines 32-34. 

B. Per dependent claims 12 and 13, these are directed to a method for performing the 
method of dependent claims 2 and 3, respectively, and therefore are rejected to claim 11 and to 
dependent claims 2 and 3. 

C. Per dependent claim 14, "The method according to claim 11, wherein said object region 
data is described by using the depth information of the object and parameters of the functions. " 



is disclosed supra by Lee for claim 4 and supra by Jasinschi for claim 11. 

D. Per dependent claim 15, "77»e method according to claim 11, wherein said depth 
information is a relative depth and has a discrete level value. " is disclosed supra by Lee for 
claim 4 and supra by Jasinschi for claim 1 1 and in col. 7, his. 14-19 at "Step 4: Extract the 
camera rotation matrix R and the camera franslation vector T from the computed essential matrix 

E. Step 5: Given R and T estimate the depth Z.sub.i at every feature point F.sup.i.sub.k. " 
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E. Claim 16, 'A method of describing object region data about an object in video data over 
a plurality of frames, said method comprising: approximating the object using a figure for each 
of said frames; extracting a plurality of points representing the figure for each of said frames; 
approximating trajectories with functions, the trajectories being obtained by arranging, in the 
frames advancing direction, data indicating positions of said plurality of points; and describing 
the object region data using the functions and display flag information indicating a range of 
frames in which the object or each of said points is visible or not. ' is disclosed by Lee supra for 
claim 1. However Lee does not appear to disclose 'display flag information indicating a range of 
frames in which the object or each of said points is visible or not. ', but Jasinschi does in col. 4, 
Ins. 20-28 at "The inputs to the 3-D camera parameter estimator 16 are raw video images, 
denoted by I.sub.k, and the corresponding "alpha" images, denoted by A.sub.k. The alpha image 
is a binary mask that determines the "valid" regions inside each image, i.e., the regions of interest 
or objects, as shown in FIG. 3 where FIG. 3A represents an image I.sub.k from a tennis match 
and FIG. 3B represents the alpha image A.sub.k for the background object with the tennis player 
blanked out." Wherein alpha images A.sub.k corresponds to display flag information for valid 
regions. 

Therefore it would have been obvious to one of ordinary skill in the art at the time the 
invention was made to apply object tracking disclosed by Lee in combination with alpha images 
A.sub.k disclosed by Jasinschi, and motivated to combine the teachings because it would 
provide a method of generating 2-D extended images from 3-D data extracted from a two- 
dimensional video sequence as revealed by Jasinschi in col. 1, lines 32-34. 
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F. Per dependent claims 1 7 and 1 8, these are directed to a method for performing the 
method of dependent claims 2 and 3, respectively, and therefore are rejected to claim 16 and to 
dependent claims 2 and 3. 

G. Per dependent claim 19, "The method according to claim 16. wherein said object region 
data is described by using the display flag information and parameters of the functions. " is 
disclosed supra by Lee and Jasinschi for claim 16 supra. Wherein alpha images A.sub.k 
corresponds to display flag information for valid regions. 

H. Claim 20, 'A method of describing object region data about an object in video data over 
a plurality of frames, said method comprising: approximating the object using a figure for each 
of said frames ; extracting a plurality of points representing the figure for each of said frames; 
approximating trajectories with functions, the trajectories being obtained by arranging, in the 
frames advancing direction, data indicating positions of said plurality of points; and describing 
the object region data using the functions and object passing range information indicating a 
range where the figure approximating the object exist over said plurality of frames. ' is disclosed 
by Lee and Jasinschi supra for claim 1 1. In particular Lee discloses 'describing the object region 
data using the functions and object passing range information indicating a range where the 
figure approximating the object exist over said plurality of frames. ' in para. 90-93 at "Sample 
Output 

[0090] FIGS. 11-13 show sample output from the semantic video object extraction system for 
several video sequences. These sequences represent different degrees of extraction difficulty in 
real situations. To parallel the operation of the invention, the samples are broken to parts, the 
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first representing initial frame (user assisted) segmentation results, and the second subsequent 
frame (automatic) tracking results. 

[0091] The three selected color video sequences are all in QCIF format (176.times.144) at 30 Hz. 
The first Akiyo 450 sequence contains a woman sitting in front of a still background. The 
motion of the human body is relatively small. However, this motion is a non-rigid body motion 
because the human body may contain moving and still parts at the same time. The goal is to 
extract the human body 452 (semantic video object) from the background 454. The second 
Foreman 456 includes a man 458 talking in front of a building 460. This video data is more 
complex than Akiyo due to the camera being in motion while the man is talking. The third video 
sequence is the well-known Mobile-calendar sequence 462. This sequence has a moving ball 
464 that is traveling over a complex background 466. This sequence is the most complex since 
the motion of the ball contains not only translational motion, but also rotational and zooming 
factors. 

[0092] FIG. 1 1 shows initial frame segmentation results. The first row 468 shows an initial 
boundary obtained by user assistance; this outline indicates an image feature within the video 
frame of semantic interest to the user. The second row 470 shows the In and Out boundaries 
defined inside and outside of the semantic video object. For the output shown, the invention was 
configured with a size of 2 for the square structure element used for dilation and erosion. The 
third row 472 shows the precise boundaries 474 located using the morphological segmentation 
tool (see FIG. 6 above). The forth row 476 shows the final extracted semantic objects. 
{0093] FIG. 12 shows subsequent frame boundary tracking results. For the output shown, the 
tracking was done at 30 Hz (no skipped frames). Each column 478, 480. 482 represents four 
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frames randomly chosen from each video sequence. FIG. 13 shows the corresponding final 
extracted semantic video objects from the FIG. 12 frames. As shown, the initial precise 
boundary 474 has been iteratively warped (FIG. 8, step 354) into a tracked 484 boundary 
throughout the video sequences; this allows implementations of the invention to automatically 

extract user-identified image features." 

Therefore it would have been obvious to one of ordinary skill in the art at the time the 
invention was made to apply object and passing range fracking disclosed by Lee in combination 
ranging information disclosed by Jasinschi, and motivated to combine the teachings because it 
would provide a method of generating 2-D extended images from 3-D data extracted from a two- 
dimensional video sequence as revealed by Jasinschi in col. 1, lines 32-34. 
I. Per dependent claims 2 1 and 22, these are directed to a method for performing the 
method of dependent claims 2 and 3, respectively, and therefore are rejected to claim 20 and to 
dependent claims 2 and 3. 

J. Per dependent claim 23, "Ue method according to claim 20, wherein said object region 
data is described by using the object passing range information and parameters of the 
functions. " is disclosed supra by Lee and Jasinschi for claim 20 supra and exemplified by Lee. 
K. Per independent claims 30-32 and 36-38, these are directed to a article of manufacture 
and computer data signal, respectively, for performing the method of independent claims 11,16, 
and 20, respectively, and therefore are identically rejected to independent claims 1 1, 16, and 20. 
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14. Claims 24-27, 33 and 39 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
Lee et al., (US Patent Publication 2001/0048753 Al), as applied to claiml-5 above, and further 
in view of "Panoramic Image Mosaics", Heung-Yeung Shum, hereafter Shum. 
A. Claim 24, 'A method of describing object region data about an object moving in a 
panorama image formed by combining a plurality of frames with being overlapped, said method 
comprising: approximating the object in the panorama image using a figure; extracting a 
plurality of points representing the figure in a coordinate system of the panorama image; 
approximating trajectories with functions, the trajectories being obtained by arranging, in the 
frames advancing direction, data indicating positions of said plurality of points; and describing 
the object region data using the functions. ' is disclosed by Lee supra for claim 1. However Lee 
does not disclose 'panorama image formed by combining a plurality of frames with being 
overlapped, said method comprising: approximating the object in the panorama image using a 
figure; extracting a plurality of points representing the figure in a coordinate system of the 
panorama image ', but Shum does in abstract and last paragraph of p. 2, at "This paper presents 
some techniques for constructing panoramic image mosaics from sequences of images. Our 
mosaic representation associates a transformation matrix with each input image, rather than 
expUcitly projecting all of the images onto a common surface (e.g., a cylinder). In particular, to 
construct a full view panorama, we introduce a rotational mosaic representation that associates a 
rotation matrix (and optionally a focal length) with each input image. A patch-based alignment 
algorithm is developed to quickly align two images given motion models. Techniques for 
estimating and refining camera focal lengths are also presented. 
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In order to reduce accumulated registration errors, we apply global alignment (block adjustment) 
to the whole sequence of images, which results in an optimally registered image mosaic. To 
compensate for small amounts of motion parallax introduced by translations of the camera and 
other umnodeled distortions, we develop a local ahgmnent (deghosting) technique which warps 
each image based on the results of pairwise local image registrations. By combining both global 
and local aligmnent, we signi_cantly improve the quality of our image mosaics, thereby enabling 
the creation of full view panoramic mosaics with hand-held cameras. 
We also present an inverse texture mapping algorithm for efficiently extracting environment 
maps from our panoramic image mosaics. By mapping the mosaic onto an arbitrary texture- 
mapped polyhedron surrounding the origin, we can explore the virtual environment using 
standard 3D graphics viewers and hardware without requiring special-purpose players. 
Third, any deviations from the pure parallax-free motion model or ideal pinhole (projective) 
camera model may resuU in local misregistrations, which are visible as a loss of detail or 

Itiple images (ghosting). To overcome this problem, we compute local motion estimates 
(block-based optical flow) between pairs of overlapping images, and use these estimates to warp 
each input image so as to reduce the misregistration. 

Therefore it would have been obvious to one of ordinary skill in the art at the time the 
invention was made to apply object fracking disclosed by Lee in combination with panorama 
image mosaics disclosed by Shum, and motivated to combine the teachings because it would 
provide a technique for constructing panoramic image mosaics from sequences of images as 
disclosed by Shum in abstract. 
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B. Per dependent claims 25-27, these are directed to a method for performing the method of 
dependent claims 2, 3, and 5, respectively, and therefore are rejected to claim 24 and to 
dependent claims 2, 3, and 5. 

C. Per independent claim 33 and 39, these are directed to a article of manufacture and 
computer data signal, respectively, for performing the method of independent claim 24 and 
therefore are identically rejected to independent claim 24. 



Responses 

15. Responses to this action should be mailed to: Commissioner of Patents and Trademarks, 
Washington, D.C. 20231. If applicant desires to fax a response, (703) 308-9051 may be used for 
formal communications or (703) 308-6606 for informal or draft communications. 

Please label "PROPOSED" or "DRAFT" for informal facsimile communications. Hand- 
delivered responses should be brought to Crystal Park II, 2121 Crystal Drive, Arlington, VA., 
Sixth Floor (Receptionist). 
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Inquiries 

16. Any inquiry concerning this communication or earlier communications from the 

should be directed to Greg Cunningham whose telephone number is (703) 308-6109. 
If attempts to reach the examiner by telephone are unsuccessfiil, the examiner's 
supervisor, Matthew Bella, can be reached on (703) 308-6829. 
Any response to this action should be mailed to: 

Commissioner of Patents and Trademarks 
Washington, D.C. 20231 
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or faxed to: 



(703) 872-9314 (for Technology Center 2600 only) 



Hand-delivered responses should be brought to Crystal Park 11, 2121 Crystal Drive, 
Arlington, VA, Sixth Floor (Receptionist). 

Any inquiry of a general nature or relating to the status of this application or proceeding 
should be directed to the Technology Center 2600 Customer Service Office whose telephone 
number is (703) 306-0377. 
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